12. Slope After Cleaning

Slope After Cleaning

Question:

In outliers/outlier_cleaner.py , you will find the skeleton for a function called outlierCleaner() that you will fill in with a cleaning algorithm. It takes three arguments: predictions is a list of predicted targets that come from your regression, ages is the list of ages in the training set, and net_worths is the actual value of the net worths in the training set. There should be 90 elements in each of these lists (because the training set has 90 points in it). Your job is to return a list called cleaned_data that has only 81 elements in it, which are the 81 training points where the predictions and the actual values (net_worths) have the smallest errors (90 * 0.9 = 81). The format of cleaned_data should be a list of tuples, where each tuple has the form (age, net_worth, error).



Once this cleaning function is working, you should see the regression result changes. What is the new slope? Is it closer to the “correct” result of 6.25?

Start Quiz:

INSTRUCTOR NOTE:

NOTE: In outliers/outlier_removal_regression.py , in the section where outlier cleaning is performed (starts with the comment ### identify and remove the most outlier-y points ), make sure that the input argument to reg.predict is ages_train and not ages so that you are cleaning based on just the training data. The arguments to the cleaner should also be based off of the *_train variables.